Model-Driven Auto-Tuning of Stencil Computations on GPUs
نویسندگان
چکیده
Stencil computations are a class of algorithms which perform nearest-neighbor computation, often on a multi-dimensional grid. This type of calculation forms the basis for computer simulations across almost every field of science. The increasing computational speed of graphics processing units (GPUs) make their use for stencil computations an interesting goal. However, achieving highly efficient implementations is often nontrivial, as numerous publications attest. In this work, we propose an analytic performance model for stencil codes on GPUs, which both delivers close-to optimal performance, but at the same time does not require extensive tuning at compile or run time. We evaluate the effectiveness of our performance model using different stencil benchmarks and with various stencil radii.
منابع مشابه
An Auto-tuning Jit Compiler for Accelerating Multiple Stencil Computations
We present a JIT compiler with auto-tuning capabilities fusing multiple stencil computations. Data arrays for scientific computing of image processing often exceed cache-memory size. To take advantage of spatial and temporal locality, a common method is to partition the images into tiling blocks for multicore architectures. In realistic scenarios, the multiple image algorithms, most of which ar...
متن کاملA Generalized Framework for Auto-tuning Stencil Computations
This work introduces a generalized framework for automatically tuning stencil computations to achieve superior performance on a broad range of multicore architectures. Stencil (nearest-neighbor) based kernels constitute the core of many important scientific applications involving block-structured grids. Auto-tuning systems search over optimization strategies to find the combination of tunable p...
متن کاملPATUS: A Code Generation and Auto-Tuning Framework For Parallel Stencil Computations
PATUS is a code generation and auto-tuning framework for stencil computations targeted at modern multiand many-core processors, such as multicore CPUs and graphics processing units. Its ultimate goals are to provide a means towards productivity and performance on current and future multiand many-core platforms. The framework generates the code for a compute kernel from a specification of the st...
متن کاملAuto-tuning the 27-point Stencil for Multicore
This study focuses on the key numerical technique of stencil computations, used in many different scientific disciplines, and illustrates how auto-tuning can be used to produce very efficient implementations across a diverse set of current multicore architectures.
متن کاملYet another Hybrid Strategy for Auto-tuning SpMV on GPUs
Sparse matrix-vector multiplication (SpMV) is a key linear algebra algorithm and is widely used in many application domains. Besides multi-core architecture, there is also extensive research focusing on accelerating SpMV on many-core Graphics Processing Units (GPUs). SpMV computations have many indirect and irregular memory accesses, and load imbalance could occur while mapping computations ont...
متن کامل